Partially Observable Multi-Sensor Sequential Change Detection: A Combinatorial Multi-Armed Bandit Approach
نویسندگان
چکیده
منابع مشابه
Combinatorial Multi-Objective Multi-Armed Bandit Problem
In this paper, we introduce the COmbinatorial Multi-Objective Multi-Armed Bandit (COMOMAB) problem that captures the challenges of combinatorial and multi-objective online learning simultaneously. In this setting, the goal of the learner is to choose an action at each time, whose reward vector is a linear combination of the reward vectors of the arms in the action, to learn the set of super Par...
متن کاملMaterial for ” Combinatorial multi - armed bandit
We use the following two well known bounds in our proofs. Lemma 1 (Chernoff-Hoeffding bound). Let X1, · · · , Xn be random variables with common support [0, 1] and E[Xi] = μ. Let Sn = X1 + · · ·+Xn. Then for all t ≥ 0, Pr[Sn ≥ nμ+ t] ≤ e−2t /n and Pr[Sn ≤ nμ− t] ≤ e−2t /n Lemma 2 (Bernstein inequality). Let X1, . . . , Xn be independent zero-mean random variables. If for all 1 ≤ i ≤ n, |Xi| ≤ k...
متن کاملMULTI–ARMED BANDIT FOR PRICING Multi–Armed Bandit for Pricing
This paper is about the study of Multi–Armed Bandit (MAB) approaches for pricing applications, where a seller needs to identify the selling price for a particular kind of item that maximizes her/his profit without knowing the buyer demand. We propose modifications to the popular Upper Confidence Bound (UCB) bandit algorithm exploiting two peculiarities of pricing applications: 1) as the selling...
متن کاملOnline Multi-Armed Bandit
We introduce a novel variant of the multi-armed bandit problem, in which bandits are streamed one at a time to the player, and at each point, the player can either choose to pull the current bandit or move on to the next bandit. Once a player has moved on from a bandit, they may never visit it again, which is a crucial difference between our problem and classic multi-armed bandit problems. In t...
متن کاملCombinatorial Multi-Armed Bandit with General Reward Functions
In this paper, we study the stochastic combinatorial multi-armed bandit (CMAB) framework that allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables. Our framework enables a much larger class of reward functions such as the max() function and nonlinear utility fun...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the AAAI Conference on Artificial Intelligence
سال: 2019
ISSN: 2374-3468,2159-5399
DOI: 10.1609/aaai.v33i01.33015733